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Abstract 

The increasing interest in renewable energy, particularly in wind, has given 
rise to the necessity of accurate models for the generation of good synthetic 
wind speed data. Markov chains are often used with this purpose but bet- 
ter models are needed to reproduce the statistical properties of wind speed 
data. In a previous paper we showed that semi-Markov processes are more 
appropriate for this purpose but to reach an accurate reproduction of real 
data features high order model should be used. In this work we introduce an 
indexed semi-Markov process that is able to fit real data. We downloaded 
a database, freely available from the web, in which are included wind speed 
data taken from L.S.I. -Lastem station (Italy) and sampled every 10 minutes. 
We then generate synthetic time series for wind speed by means of Monte 
Carlo simulations. The time lagged autocorrelation is then used to compare 
statistical properties of the proposed model with those of real data and also 
with a synthetic time series generated though a simple semi-Markov process. 

Keywords: indexed semi-Markov process, synthetic time series, 
autocorrelation, Monte Carlo simulation 



1. Introduction 

Wind represents one of the most popular renewable energy. Through wind 
turbines it is possible to transform kinetic energy of the wind into electrical 
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energy that we use, daily, in our applications. The rotor of the wind turbines 
captures the kinetic energy of the mass of air that transient into its area, 
this allow the rotation of the rotor and, consequentially, the production of 
the electrical energy. Wind turbine is a very complex system composed of 
many mechanical parts, electric parts and electronics control. Most of these 
components need, especially the mechanical ones, to be designed at fatigue 
failure, caused to the cyclically trend of the wind. A powerful stochastic 
model, that allows the generation of synthetic wind speed data, can help the 
design task. Sometimes, in fact, it is just the lack of unknown loads applied 
to the system to make difficult its design. The capability, known the value of 
the actual wind speed, to estimate the successive value can assist the system 
control, in the common wind turbine, to recover more energy by rotating 
the blades according to the increasing wind speed and, at the same time, 
maintaining constant the rotor speed. 

To generate synthetic data, Markov chains of first or higher order are 
often used PQ El E] • In particular in [Ij is presented a comparison between a 
first-order Markov chain and a second-order Markov chain. A similar work, 
but only for the first-order Markov chain, is conduced by [2], presenting 
the probability transition matrix and comparing the energy spectral density 
and autocorrelation of real and synthetic wind speed data. A tentative to 
modehng and to join speed and direction of wind is presented in [3] , by using 
two models, first-order Markov chain with different number of states, and 
Weibull distribution. 

All these models use Markov chains to generate synthetic wind speed 
time series but the search for a better model is still open. Approaching this 
issue, in a previous work |4Jj we proposed different second order semi-Markov 
models for generation of synthetic time series of wind speed. We showed 
that, comparing the autocorrelations function of real and synthetic data, 
that semi-Markov models can reproduce this feature of real data but the 
autocorrelation drops to zero faster than for real data. In order to overcome 
the problem of low autocorrelation, in this paper we propose an indexed semi- 
Markov model for wind speed modeling. More precisely we assume that wind 
speed is described by a discrete time homogeneous semi-Markov process in 
which we introduce a memory index that takes into account the periods of 
high and low wind speed. 

The paper is organized as follows. First of all we present the model and 
the equations governing the process. Then, we introduce the database used 
and, by applying Monte Carlo simulations, features of real and synthetic data 
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from the models are compared. 

2. Wind speed semi-Markov model with memory 

In this section, we give a new statistical models of wind speed, and conse- 
quently we propose a novel forecasting method based on a particular indexed 
semi-Markov model in discrete time. Indexed semi-Markov processes have 
been proposed recently by [5] and [6] . In the first paper different theoretical 
aspects are addressed and the second paper considers a different indexed pro- 
cess which is shown to be able to describe the persistence in financial data. 

Here we consider a specific model able to capture the persistence in the 
wind speed data. This is an important task and model based on Markov 
chains have been showed to have a poor fitting of the observed autocorre- 
lation function. To improve the fitting, recently we proposed [1] an higher 
order semi-Markov chain model. Anyway with a second order semi-Markov 
chain the significant gain in the autocorrelation is showed still to be too far 
from the observed time series. Obviously an increase of the order would im- 
prove the results but the increase in the parameters and the computational 
effort will be dramatically affected. For this reasons, here we consider a more 
parsimonious approach which reveals to be very efficient and we show how 
to implemente it. 

Let F, P) be a probability space and consider the stochastic process 

with a finite state space E = {1,2, S}. In our framework the random 
variable J„ describes the wind speed at the n-th transition. 
Let us consider the stochastic process 

Trri rri rri rri rri 

with values in IN. The random variable T„ describes the time in which the 
n-th transition of the wind speed occurs. We denote the stochastic process 
{Xn}neTN where Xn is the sojourn time in state J„_i before the nth jump. 
Thus we have for all n G IN X„ = T„_|_i — T„. 
Let us consider also the stochastic process 
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with values in IR. The random variable f/„ describes the value of the index 
process at the nth transition. 



where / :ii^xlN— j-lRisa Borel measurable bounded function and f/^^^^.^), 
are known and non-random. 

The process can be interpreted as a moving average of the accumu- 
lated reward process with the function / as a measure of the permanence 
reward per unit time. 

The function / depends on the state of the system Jn-i-k and on the 
time s. 

It should be noted that the order of the moving average is on the number 
of transitions. As a consequence, the moving average is executed on time 
windows of variable length. 

The indexed model is fully specified once the dependence structure be- 
tween the variables is assumed. Toward this end we adopt the following 
assumption: 



P[J„+i = J, T„+i - T„ < t\a{Jh, n, f/D, h = -m, 0, n, J, = z, U^^" = v] 
= PyJn^i = J, T„+i - T„ < t| J„ = I, = v] := Q^iv; t), 



where a{Jh,Th, UJ^), h < n is the natural filtration of the three- variate pro- 
cess. 

The matrix of functions Q'"(f ; t) = {Q^{v] t))ij^E is called indexed semi-Markov 



The joint process {Jn, Tn), which is embedded in the indexed semi-Markov 
kernel, depends on the moving average process U^, the latter acts as a 
stochastic index. Moreover, the index process depends on ( J„, T„) through 
the functional relationship ([T]). 

To describe the behavior of our model at whatever time t we need to 
define additional stochastic processes. 

Given the three-dimensional process {J„,T„, U^} and the indexed semi- 




X, 



(1) 



(2) 



kernel. 
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Markov kernel Q™(f ; t), we define by 



N{t) = sup{n eN:Tn<t}; 
Z{t) = Jnh)', 



m 



(iAT(jv(f)_e)+i-A;)-T{jv(t)-9)-fc 



(3) 



1 



E 



f{J{N{t)-e)-k, s), 



—m 



k=0 



where TN{t) < t < TN{t)+i and 6 = l{t=Tjv(t)}- 

The stochastic processes defined in (|3| represent the number of transitions 
up to time t, the state of the system (wind speed) at time t and the value 
of the index process (moving average of function of wind speed) up to t, 
respectively. We refer to Z{t) as an indexed semi-Markov chain. 

The process U'^{t) extends the process to the case where time t can 
be a transition or a waiting time. It is simple to realize that if Vm, if t = T„ 
we have that f/'"(t) = t/™. 

In the papers by [3], [0] explicit renewal-type equations were given to 
describe the probabilistic behaviour of the indexed semi-Markov process. 
We do not report here those results applied to our model because, in the 
implementation of the model given in next section we follow a Monte Carlo 
simulation based approach. 

3. Application to real data 

To check the validity of our model we perform a comparison of the behav- 
ior of real data and wind speeds generated through Monte Carlo simulations 
based on the model described above. In this section we describe the database 
of real data used for the analysis, the method used to simulate synthetic wind 
speed time series and, at the end, we compare results from real and simulated 
data. 

3.1. Database 

The data used in this analysis are freely available h:om. \http : / /www. I si — 
lastem.it / meteo/page/ dwnldata.aspx , The station of L.S.I. -Lastem is sit- 
uated in Italy at N 45 28' 14,9" - E 9 22' 19,9" and at 107 m of altitude. 
The station use a combined speed-direction anemometer at 22 m above the 
ground. It has a measurement range that goes from to 60 m/s, a threshold 
of 0,38 m/s and a resolution of 0,05 m/s. The station processes the speed 
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Figure 1: Time series of wind speed and its distribution. 

every 10 minute in a time interval ranging from 25/10/2006 to 28/06/2011. 
During the 10 minutes are performed 31 sampling which are then averaged 
in the time interval. In this work, we use the sampled data that represents 
the average of the modulus of the wind speed (m/s) without considering a 
specific direction. The database is then composed of about 230thousands 
wind speed measures ranging from to 16 m/s. The time series, together 
with its empirical probability density function(pdf) are represented in figure 
[TJ It is interesting to note that the pdf resemble exactly a Weibull distribu- 
tion. To be able to model the wind speed as a semi-Markov process the state 
space of wind speed has been discretized. In the example shown in this work 
we discretized wind speed into 7 states chosen to cover all the wind speed 
distribution. 

3.2. Monte Carlo simulations 

In the model described in the previous section and in particular in the 
definition of the index process f/™ the function / : £" x IR — )■ IR is any Borel 
measurable bounded function. To perform simulations, we choose a function 



6 



which is motivated essentially by simplicity, we want to keep the model as 
simple as possible. 

Let us briefly remind that wind speed data are long range positively au- 
tocorrelated. This implies that there are periods of high and low speed. 
Motivated by this empirical facts we suppose that also the transition proba- 
bilities depends on whether the wind is, on average, in a high speed period 
or in a low one. We then fixed the function / to be the wind speed itself, i.e. 
/( J„_i_fc, s) = Jn-i-k for all s G IN. Consequently we obtain 



^™ ~T - T ^ ^ Jn-l-k ■ Xn-l-k 

(4) 



n— 1 — fe 



Tn-k — Tn-l- 



\Tn Tn-(m+l) 

In this simple case the index process expresses a moving average of order 
m + 1 executed on the series of the wind speed values with weights given by 
the fractions of sojourn times in that wind speed with respect to the interval 
time on which the average is executed. 



Note that the memory is the number of transitions. The index f/™ ob- 
tained from the given definition of / was also discretized into 5 states of low, 
medium low, medium, medium high and high speed. 

According to these choices we estimated, from real data, the probabilities 
Q'^{v;t) defined in formula Q for different values of m. For the results 
shown below m was chosen to run from 1 to 30 transitions. 

Then, this probabilities have been used to simulate synthetic time series of 
returns needed to compare results from real data and the model as described 
in the next section. Note that these are step-by-step simulations in which 
the index has to be calculated from the last m simulated transitions. 

3. 3. Results on autocorrelation function 

A very important feature of wind speed data is that they are long range 
correlated. It is then very important that theoretical models do reproduce 
this features. We tested our model to check whether it is able to reproduce 
such behavior. Given the presence of the parameter m in the index function, 
we also tested the autocorrelation behavior as a function of m. 
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Figure 2: Autocorrelation functions of real data (solid line) and of 4 synthetic time series 
as described in the label. In the left panel wind speed is measured every 10 minutes while 
in the right panel every hour. 



If Z indicates wind speed, the time lagged (r) autocorrelation of wind 
speed is defined as 

Cot;(Z(t + r),Z(t)) 

= — v^^) 

We tested our model on two different time scale of wind speed, one in which 
data are measured every 10 minutes and one in which data are measured 
every 1 hour. The time lag r was made to run from 1 minute up to 1000 
minutes in the first case and from 1 hour to 50 hours in the second one. Note 
that to be able to compare results for E(r) each simulated time series was 
generated with the same length as real data. As expected (see Figure [2]), real 
data do show a long range correlation with a sinusoidal behavior, the latter 
is due to simple seasonal effect of the 24hour earth cycle. Let us then analyze 
results for the synthetic time series. The simple semi-Markov model starts at 
the same value but the persistence is very short and after few time steps the 
autocorrelation decrease to zero. A very interesting behavior is instead shown 
by the semi-Markov models with memory index. If a small memory (m = 3 
in the shown example) is used, the autocorrelation is already persistent but 
again decreases faster than real data. With a longer memory (m = 7/8) the 
autocorrelation remain high for a very long period and also its value is very 
close to that of real data. If m is increased further the autocorrelation drops 
again to small values. This behavior suggest the existence of an optimal 
memory m. In our opinion one can justify this behavior by saying that short 
memories are not enough to identify in which speed status is the wind, too 
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Figure 3: Mean square error between autocorrelation function of real data and synthetic 
data as a function of the memory value m. As in Figure [2] the left panel wind speed is 
measured every 10 minutes while in the right panel every hour. 

long memories mix together different status and then much of the information 
is lost in the average. All this is shown in Figure |3] where the mean square 
error between each autocorrelation function of simulated time series and the 
autocorrelation function of the real data as a function of m is computed. It 
can be noticed that there exist an optimal value of the memory m that makes 
the autocorrelation of simulated data closer to that of real data. 

4. Conclusion 

The wind is a very unstable phenomenon characterized by a sequence of 
lulls and sustained speeds, and a good wind generator must be able to repro- 
duce such sequences. In the present work we propose a new stochastic model 
to simulate synthetic wind speed data. Starting from our previous paper |1] , 
where first and second order semi-Markov process were used, we introduce a 
memory indexed semi-Markov process which, by considering only the mov- 
ing average of past wind speed, is able to faithfully reproduced the statistical 
characteristics of wind speed. To check the validity of the predictive semi- 
Markovian model, the persistence of synthetic winds were calculated, then 
averaged and computed. 
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